UIDS: A Multilingual Document Summarization Framework Based on Summary Diversity and Hierarchical Topics
نویسندگان
چکیده
In this paper, we put forward UIDS, a new high-performing extensible framework for extractive MultiLingual Document Summarization. Our approach looks on a document in a multilingual corpus as an item sequence set, in which each sentence is an item sequence and each item is the minimal semantic unit. Then we formalize the extractive summary as summary diversity sampling problem that considers topic diversity and redundancy at the same time. The topic diversity is reflected using hierarchical topic models, the redundancy is reflected using similarity and the summary diversity is enhanced using Determinantal Point Processes. We then illustrate how this method encompasses a framework that is amenable to compute summaries for MultiLingual Singleand Multi-documents. Experiments on the MultiLing summarization task datasets demonstrate the effectiveness of our approach.
منابع مشابه
A survey on Automatic Text Summarization
Text summarization endeavors to produce a summary version of a text, while maintaining the original ideas. The textual content on the web, in particular, is growing at an exponential rate. The ability to decipher through such massive amount of data, in order to extract the useful information, is a major undertaking and requires an automatic mechanism to aid with the extant repository of informa...
متن کاملAllSummarizer system at MultiLing 2015: Multilingual single and multi-document summarization
In this paper, we evaluate our automatic text summarization system in multilingual context. We participated in both single document and multi-document summarization tasks of MultiLing 2015 workshop. Our method involves clustering the document sentences into topics using a fuzzy clustering algorithm. Then each sentence is scored according to how well it covers the various topics. This is done us...
متن کاملMultilingual Multi-document Summarization with Enhanced hLDA Features
This paper presents the state of art research progress on multilingual multi-document summarization. Our method utilizes hLDA (hierarchical Latent Dirichlet Allocation) algorithm to model the documents firstly. A new feature is proposed from the hLDA modeling results, which can reflect semantic information to some extent. Then it combines this new feature with different other features to perfor...
متن کاملImpact of Document Structure on Hierarchical Summarization
Hierarchical summarization technique summarizes a large document based on the hierarchical structure and salient features of the document. Previous study has shown that hierarchical summarization is a promising technique which can effectively extract the most important information from the source document. Hierarchical summarization has been extended to summarization of multiple documents. Thre...
متن کاملBringing Summarization to End Users: Semantic Assistants for Integrating NLP Web Services and Desktop Clients
We present PathSum, a high-performing hierarchical-topic based singleand multi-document automatic text summarization framework. This approach leverages Bayesian nonparametric methods to model sentences as paths through a tree and create a hierarchy of topics from the input in an unsupervised setting. We describe the generative model used to learn a topic tree based on hierarchical latent Dirich...
متن کامل